tibble [9,994 x 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Ship Mode : chr [1:9994] "Second Class" "Second Class" "Second Class" "Standard Class" ...
$ Segment : chr [1:9994] "Consumer" "Consumer" "Corporate" "Consumer" ...
$ Country : chr [1:9994] "United States" "United States" "United States" "United States" ...
$ City : chr [1:9994] "Henderson" "Henderson" "Los Angeles" "Fort Lauderdale" ...
$ State : chr [1:9994] "Kentucky" "Kentucky" "California" "Florida" ...
$ Postal Code : num [1:9994] 42420 42420 90036 33311 33311 ...
$ Region : chr [1:9994] "South" "South" "West" "South" ...
$ Category : chr [1:9994] "Furniture" "Furniture" "Office Supplies" "Furniture" ...
$ Sub-Category: chr [1:9994] "Bookcases" "Chairs" "Labels" "Tables" ...
$ Sales : num [1:9994] 262 731.9 14.6 957.6 22.4 ...
$ Quantity : num [1:9994] 2 3 2 5 2 7 4 6 3 5 ...
$ Discount : num [1:9994] 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
$ Profit : num [1:9994] 41.91 219.58 6.87 -383.03 2.52 ...
- attr(*, "spec")=
.. cols(
.. `Ship Mode` = col_character(),
.. Segment = col_character(),
.. Country = col_character(),
.. City = col_character(),
.. State = col_character(),
.. `Postal Code` = col_double(),
.. Region = col_character(),
.. Category = col_character(),
.. `Sub-Category` = col_character(),
.. Sales = col_double(),
.. Quantity = col_double(),
.. Discount = col_double(),
.. Profit = col_double()
.. )
Exploratory Data Analysis
Data Description Data contains Sales records from a Super-Store located in USA Ship Mode- The mode in which the product was shipped. 4 Possible Modes “First Class” “Same Day” “Second Class” “Standard Class”
*Segment- The Sale segment +3 Possible Segments +“Consumer” “Corporate” “Home Office”
*Country- Country in which the sale was made. (United States)
*City- The city in which the sale was made .
*State- State in which the sale was made.
*Postal Code- Postal Code for the region of the sale.
Region - The Region in which the sale was made. +4 possible values + “South” “West” “Central” “East” Category- The category of product which was purchased. +3 possible values +“Furniture” “Office Supplies” “Technology” *Sub Category- A more in-depth category of update +17 possible values
*Sales- Amount of money transaction for the Sale
*Quantity- Quantity of products involved in the sale
*Discount- Discount applied on the sale
*Profit- Profit made from the sale
Describe the data# A tibble: 4 x 26
variable n na mean sd se_mean IQR skewness kurtosis p00
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Discount 9994 0 0.156 0.206 0.00207 0.2 1.68 2.41 0.
2 Quantity 9994 0 3.79 2.23 0.0223 3 1.28 1.99 1.00e+0
3 Profit 9994 0 28.7 234. 2.34 27.6 7.56 397. -6.60e+3
4 Sales 9994 0 230. 623. 6.23 193. 13.0 305. 4.44e-1
# ... with 16 more variables: p01 <dbl>, p05 <dbl>, p10 <dbl>, p20 <dbl>,
# p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>, p60 <dbl>, p70 <dbl>,
# p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>, p99 <dbl>, p100 <dbl>
Overview of the data
# A tibble: 1 x 2
Sale Profit
<dbl> <dbl>
1 2297201. 286397.
Number of Sales
# A tibble: 6 x 3
State N profit
<chr> <int> <dbl>
1 California 2001 38.2
2 New York 1128 65.6
3 Texas 985 -26.1
4 Pennsylvania 587 -26.5
5 Washington 506 66.0
6 Illinois 492 -25.6
[1] 9994
Breakdown of Number of Sales State-Wise sale % contribution
# A tibble: 49 x 2
State PerCent
<chr> <dbl>
1 California 20.0
2 New York 11.3
3 Texas 9.86
4 Pennsylvania 5.87
5 Washington 5.06
6 Illinois 4.92
7 Ohio 4.69
8 Florida 3.83
9 Michigan 2.55
10 North Carolina 2.49
# ... with 39 more rows
City-Wise sale % Contribution
# A tibble: 531 x 2
City PerCent
<chr> <dbl>
1 New York City 9.16
2 Los Angeles 7.47
3 Philadelphia 5.37
4 San Francisco 5.10
5 Seattle 4.28
6 Houston 3.77
7 Chicago 3.14
8 Columbus 2.22
9 San Diego 1.70
10 Springfield 1.63
# ... with 521 more rows
Profit Breakdown
# A tibble: 49 x 3
State Total_Profit SD
<chr> <dbl> <dbl>
1 California 76381. 97.7
2 New York 74039. 233.
3 Washington 33403. 325.
4 Michigan 24463. 375.
5 Virginia 18598. 235.
6 Indiana 18383. 694.
7 Georgia 16250. 283.
8 Kentucky 11200. 171.
9 Minnesota 10823. 506.
10 Delaware 9977. 519.
# ... with 39 more rows
Avg-Discounts (State Wise)
# A tibble: 6 x 2
State Avg
<chr> <dbl>
1 Illinois 0.390
2 Texas 0.370
3 Pennsylvania 0.329
4 Ohio 0.325
5 Colorado 0.316
6 Arizona 0.304
# A tibble: 6 x 2
State Total
<chr> <dbl>
1 Arizona 35282.
2 Colorado 32108.
3 Illinois 80166.
4 Ohio 78258.
5 Pennsylvania 116512.
6 Texas 170188.
Which Category of Sales brings the maximum sales and profits?
# A tibble: 3 x 3
Category Total_Sale Total_Profit
<chr> <dbl> <dbl>
1 Furniture 742000. 18451.
2 Office Supplies 719047. 122491.
3 Technology 836154. 145455.
# A tibble: 46 x 2
State N
<chr> <int>
1 California 359
2 New York 208
3 Texas 179
4 Pennsylvania 119
5 Ohio 102
6 Washington 100
7 Illinois 84
8 Florida 69
9 North Carolina 51
10 Arizona 48
# ... with 36 more rows
Number of Sales for each Category
# A tibble: 3 x 2
Category `n()`
<chr> <int>
1 Furniture 2121
2 Office Supplies 6026
3 Technology 1847
In more Detail
# A tibble: 142 x 3
# Groups: State [49]
State Category N
<chr> <chr> <int>
1 California Office Supplies 1198
2 New York Office Supplies 684
3 Texas Office Supplies 604
4 California Furniture 444
5 California Technology 359
6 Pennsylvania Office Supplies 343
7 Washington Office Supplies 292
8 Illinois Office Supplies 285
9 Ohio Office Supplies 274
10 New York Furniture 236
# ... with 132 more rows
Check what % of sales were discounted for big sale spots (States)
# A tibble: 20 x 5
State T_S T_D T_P Percent
<chr> <dbl> <dbl> <dbl> <dbl>
1 California 457688. 146. 76381. 0.0318
2 New York 310876. 62.4 74039. 0.0201
3 Washington 138641. 32.4 33403. 0.0234
4 Michigan 76270. 1.8 24463. 0.00236
5 Virginia 70637. 0 18598. 0
6 Indiana 53555. 0 18383. 0
7 Georgia 49096. 0 16250. 0
8 Kentucky 36592. 0 11200. 0
9 Minnesota 29863. 0 10823. 0
10 New Jersey 35764. 0.6 9773. 0.00168
11 Wisconsin 32115. 0 8402. 0
12 Florida 89474. 115. -3399. 0.128
13 Arizona 35282. 68 -3428. 0.193
14 Tennessee 30662. 53.3 -5342. 0.174
15 Colorado 32108. 57.6 -6528. 0.179
16 North Carolina 55603. 70.6 -7491. 0.127
17 Illinois 80166. 192. -12608. 0.239
18 Pennsylvania 116512. 193. -15560. 0.166
19 Ohio 78258. 152. -16971. 0.195
20 Texas 170188. 365. -25729. 0.214
Conclusion We see the following Trends in the data: States with higher sales have lower Percentage of Discounts States with higher Discount Rates are actually losing the Store money
Suggestion: Increase Prices in States with Discount % more than 0.1 Most States with high sales don’t have any discounts applied, following this the top 10 sale states should have 0 Discounts to bring in more profit.
Our most valuable Category is Office supplies in the high Sales states, thus we should increase stock and prices of this category in the same states
California and New York are the two most important locations for this store, it would be advised to focus their attention and expand in these two location as these have the most promise.
---
title: "Task_5 Dashboard"
output:
flexdashboard::flex_dashboard:
vertical_layout: scroll
source_code: embed
theme: cerulean
---
Calculations/Tables
=====================================
```{r setup, include=FALSE}
library(flexdashboard)
```
##Loading Required Libraries
```{r}
library(gsheet)#Access data from docs
library(dlookr)
library(dplyr)
library(ggplot2)
library(forcats)
library(magrittr)
library(treemap)
library(tm)
library(SnowballC)
library(wordcloud)
library(RColorBrewer)
library(leaflet)
library(widgetframe)
library(doBy)
library(viridisLite)
library(plotly)
```
###Downloading Dataset
```{r}
superstore <- gsheet2tbl('https://docs.google.com/spreadsheets/d/1ELHYd5wKnLdJE3tuszI-Ced_hyye5CnOnArshR_UGPI/edit?usp=sharing')
str(superstore)
```
**Exploratory Data Analysis**
*Data Description*
Data contains Sales records from a Super-Store located in USA
Ship Mode- The mode in which the product was shipped.
4 Possible Modes
"First Class" "Same Day" "Second Class" "Standard Class"
*Segment- The Sale segment
+3 Possible Segments
+"Consumer" "Corporate" "Home Office"
*Country- Country in which the sale was made. (United States)
*City- The city in which the sale was made .
*State- State in which the sale was made.
*Postal Code- Postal Code for the region of the sale.
*Region - The Region in which the sale was made.
+4 possible values
+ "South" "West" "Central" "East"
*Category- The category of product which was purchased.
+3 possible values
+"Furniture" "Office Supplies" "Technology"
*Sub Category- A more in-depth category of update
+17 possible values
*Sales- Amount of money transaction for the Sale
*Quantity- Quantity of products involved in the sale
*Discount- Discount applied on the sale
*Profit- Profit made from the sale
**Describe the data**
```{r}
describe(superstore,Discount,Quantity,Profit,Sales)
```
**Overview of the data**
```{r}
Overview<-superstore %>% summarise(Sale=sum(Sales),Profit=sum(Profit))
Overview
```
**Number of Sales**
```{r}
#Get the number of sales made in each state to find useful information
No_sales<-superstore %>% group_by(State) %>% summarise(N=n(),profit=mean(Profit)) %>% arrange(desc(N)) %>% head()
No_sales
#Total Sales made:
nrow(superstore)
```
**Breakdown of Number of Sales**
*State-Wise sale % contribution*
```{r}
#State-Wise sale % contribution
superstore %>% group_by(State) %>% summarise(PerCent=n()*100/nrow(superstore)) %>% arrange(desc(PerCent))
```
*City-Wise sale % Contribution*
```{r}
#City-Wise sale % Contribution
superstore %>% group_by(City) %>% summarise(PerCent=n()*100/nrow(superstore)) %>% arrange(desc(PerCent))
```
**Profit Breakdown**
```{r}
#Find the states that are most profitable
superstore %>% group_by(State) %>% summarise(Total_Profit=sum(Profit),SD=sd(Profit)) %>% arrange(desc(Total_Profit))
```
**Avg-Discounts (State Wise)**
```{r}
dis<-superstore %>% group_by(State) %>% summarise(Avg=mean(Discount)) %>% arrange(desc(Avg)) %>% head()
dis
superstore %>% group_by(State) %>% filter(State%in%dis$State) %>% summarise(Total=sum(Sales))
sales<-superstore %>% group_by(State) %>% summarise(Total=sum(Sales)) %>% arrange(desc(Total))
#Find the states which bring the most amount of money in sales
valued_states<-sales$State[1:20]
```
**Which Category of Sales brings the maximum sales and profits? **
```{r}
superstore %>% group_by(Category) %>% summarise(Total_Sale=sum(Sales),Total_Profit=sum(Profit))
superstore %>% group_by(State) %>% filter(Category=="Technology") %>% summarise(N=n()) %>% arrange(desc(N))
#We see California pops up again as the biggest customer
```
**Number of Sales for each Category**
```{r}
superstore %>% group_by(Category) %>% summarise(n())
```
**In more Detail**
```{r}
Sold_Category<-superstore %>% group_by(State,Category) %>% summarise(N=n()) %>% arrange(desc(N))
Sold_Category
```
**Check what % of sales were discounted for big sale spots (States)**
```{r}
superstore %>% group_by(State) %>% filter(State%in%valued_states) %>%
summarise(T_S=sum(Sales),T_D=sum(Discount),T_P=sum(Profit),Percent=T_D*100/T_S) %>% arrange(desc(T_P))
```
Representations {data-orientation=rows}
=====================================
Row
-------------------------------------
### Sale Hot-Spots
```{r}
cities<-data.frame(S=c("California","New York","Texas","Pennsylvania"),lat=c(35.8863,43.1007,29.4128,41.2033),lng=c(-118.656,-75.2932,-94.9658,-77.1945))
cities<-setNames(cities,c("State","lat","lng"))
temp<-merge(cities,No_sales)[1:4,1:4]
map <- leaflet() %>%
setView(lat = 36, lng = -99.7129, zoom=4) %>%
addTiles(group="OSM") %>%
addProviderTiles("Esri.NatGeoWorldMap") %>%
addCircleMarkers(data =temp, ~lng, ~lat,weight = 0.5, col = 'black',fillColor = "purple", radius =~ N/100 , fillOpacity = 0.6, stroke = T, label = ~paste0(as.character(State),' ',as.character(N)),group = 'Points')
map
```
Row
-------------------------------------
### Treemap
```{r}
treemap(superstore,index=c("Category","Sub-Category"),vSize = "Sales",vColor = "Profit",type = "value",palette = rev(viridis(6)))
```
### WordCloud
```{r}
wordcloud(words = superstore$`Sub-Category`,min.freq = 1,max.words = 100,random.order = F,rot.per = 0.35,colors = brewer.pal(8,'Set2'))
```
Number of Sales {data-orientation=rows}
=====================================
Row {data-width=650}
-------------------------------------
### State Wise
```{r}
t<-superstore %>% group_by(State) %>% summarise(N=n(),profit=mean(Profit)) %>% arrange(desc(N))
fig <- plot_ly(
x =t$State[1:6],
y = t$N[1:6],
type = 'bar',
text = t$State[1:6],
marker=list(color=c("red","red","silver","silver","silver","silver"))
)
fig <- fig %>% layout(
title = 'State-Wise Sale Distribution',
xaxis = list(
type = 'category',
title = 'State'
),
yaxis = list(
title = '# of Sales'
)
)
fig
```
Row {.tabset .tabset-fade}
-------------------------------------
### City Wise
```{r}
t<-superstore %>% group_by(City) %>% summarise(N=n(),profit=mean(Profit)) %>% arrange(desc(N))
fig <- plot_ly(
x =t$City[1:6],
y = t$N[1:6],
type = 'bar',
text = t$State[1:6],
marker=list(color=c("red","red","silver","silver","silver","silver"))
)
fig <- fig %>% layout(
title = 'City-Wise Sale Distribution',
xaxis = list(
type = 'category',
title = 'City'
),
yaxis = list(
title = '# of Sales'
)
)
fig
```
### Region Wise
```{r}
t<-superstore %>% group_by(Region) %>% summarise(N=n()) %>% arrange(desc(N))
fig <- plot_ly(
x =t$Region,
y = t$N,
type = 'bar',
text = t$Region,
marker=list(color=c("red","silver","silver","silver"))
)
fig <- fig %>% layout(
title = 'Region-Wise Sale Distribution',
xaxis = list(
type = 'category',
title = 'Region'
),
yaxis = list(
title = '# of Sales'
)
)
fig
```
### Ship-Mode Wise
```{r}
t<-superstore %>% group_by(`Ship Mode`) %>% summarise(N=n()) %>% arrange(desc(N))
fig <- plot_ly(
x =t$`Ship Mode`,
y = t$N,
type = 'bar',
text = t$`Ship Mode`,
marker=list(color=c("red","silver","silver","silver"))
)
fig <- fig %>% layout(
title = 'Ship_Mode-Wise Sale Distribution',
xaxis = list(
type = 'category',
title = 'Ship Mode'
),
yaxis = list(
title = '# of Sales'
)
)
fig
```
### Segment Wise
```{r}
data<-superstore %>% group_by(Segment) %>% summarise(N=n())
fig <- plot_ly(data, labels = ~Segment, values = ~N, type = 'pie',
textposition = 'inside',
textinfo = 'label+percent',
insidetextfont = list(color = '#FFFFFF'),
hoverinfo = 'text',
text = ~paste( N, ' Sales'),showlegend = FALSE,
marker=list(line = list(color = '#FFFFFF', width = 1)))
fig <- fig %>% layout(title = '# of Sales Segment-Wise',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig
```
### Category Wise
```{r}
data<-superstore %>% group_by(Category) %>% summarise(N=n())
fig <- plot_ly(data, labels = ~Category, values = ~N, type = 'pie',
textposition = 'inside',
textinfo = 'label+percent',
insidetextfont = list(color = '#FFFFFF'),
hoverinfo = 'text',
text = ~paste( N, ' Sales'),showlegend = FALSE,
marker=list(line = list(color = '#FFFFFF', width = 1)))
fig <- fig %>% layout(title = '# of Sales Category-Wise',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig
```
Sales(Amount) {data-orientation=rows}
=====================================
Row {data-width=650}
-------------------------------------
### State-Wise Division
```{r}
t<-superstore %>% group_by(State) %>% summarise(N=sum(Sales)) %>% arrange(desc(N))
fig <- plot_ly(
x =t$State[1:6],
y = t$N[1:6],
type = 'bar',
text = t$State[1:6],
marker=list(color=c("red","red","silver","silver","silver","silver"))
)
fig <- fig %>% layout(
title = 'State-Wise Sale Distribution',
xaxis = list(
type = 'category',
title = 'State'
),
yaxis = list(
title = '$ in Sales'
)
)
fig
```
Row {.tabset .tabset-fade}
-------------------------------------
### Category Wise
```{r}
data<-superstore %>% group_by(Category) %>% summarise(N=sum(Sales))
fig <- plot_ly(data, labels = ~Category, values = ~N, type = 'pie',
textposition = 'inside',
textinfo = 'label+percent',
insidetextfont = list(color = '#FFFFFF'),
hoverinfo = 'text',
text = ~paste( "$",N, 'in Sales'),showlegend = FALSE,
marker=list(line = list(color = '#FFFFFF', width = 1)))
fig <- fig %>% layout(title = 'Amount of Sale Category Wise',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig
```
### City Wise
```{r}
t<-superstore %>% group_by(City) %>% summarise(N=sum(Sales)) %>% arrange(desc(N))
fig <- plot_ly(
x =t$City[1:6],
y = t$N[1:6],
type = 'bar',
text = t$State[1:6],
marker=list(color=c("red","red","silver","silver","silver","silver"))
)
fig <- fig %>% layout(
title = 'City-Wise Sale Distribution',
xaxis = list(
type = 'category',
title = 'City'
),
yaxis = list(
title = '# of Sales'
)
)
fig
```
### Region Wise
```{r}
t<-superstore %>% group_by(Region) %>% summarise(N=sum(Sales)) %>% arrange(desc(N))
fig <- plot_ly(
x =t$Region,
y = t$N,
type = 'bar',
text = t$Region,
marker=list(color=c("red","silver","silver","silver"))
)
fig <- fig %>% layout(
title = 'Region-Wise Sale Distribution',
xaxis = list(
type = 'category',
title = 'Region'
),
yaxis = list(
title = '$ of Sales'
)
)
fig
```
### Ship-Mode Wise
```{r}
t<-superstore %>% group_by(`Ship Mode`) %>% summarise(N=sum(Sales)) %>% arrange(desc(N))
fig <- plot_ly(
x =t$`Ship Mode`,
y = t$N,
type = 'bar',
text = t$`Ship Mode`,
marker=list(color=c("red","silver","silver","silver"))
)
fig <- fig %>% layout(
title = 'Ship_Mode-Wise Sale Distribution',
xaxis = list(
type = 'category',
title = 'Ship Mode'
),
yaxis = list(
title = '$ of Sales'
)
)
fig
```
### Segment Wise
```{r}
data<-superstore %>% group_by(Segment) %>% summarise(N=sum(Sales))
fig <- plot_ly(data, labels = ~Segment, values = ~N, type = 'pie',
textposition = 'inside',
textinfo = 'label+percent',
insidetextfont = list(color = '#FFFFFF'),
hoverinfo = 'text',
text = ~paste( N, '$ in Sales'),showlegend = FALSE,
marker=list(line = list(color = '#FFFFFF', width = 1)))
fig <- fig %>% layout(title = '$ of Sales Segment-Wise',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
fig
```
Conclusion {data-orientation=rows}
=====================================
**Conclusion**
We see the following Trends in the data:
States with higher sales have lower Percentage of Discounts
States with higher Discount Rates are actually losing the Store money
**Suggestion:**
Increase Prices in States with Discount % more than 0.1
Most States with high sales don't have any discounts applied, following this the top 10 sale states should have 0 Discounts to bring in more profit.
Our most valuable Category is Office supplies in the high Sales states, thus we should increase stock and prices of this category in the same states
California and New York are the two most important locations for this store, it would be advised to focus their attention and expand in these two location as these have the most promise.